For this assignment, we are going to investigate serious incidents requiring the fire department to respond. Using data about the locations of firehouses and fires occurring in New York City, we want to know whether response times to fires differ across the city. Second, we will try to focus on one possible variable that could affect response times – the distance from the firehouse – and see whether we find the (expected) effect.
To keep this homework manageable, I am leaving out another part of the investigation: What is the effect of demographic and/or income characteristics of the neighborhood on response times. This is likely a bit more sensitive but also relevant from a public policy perspective.
We rely on two data sets.
NYC Open Data has data on all incidents responded to by fire companies. I have included the variable description file in the exercise folder. The following variables are available:
This dataset is only update annually, and thus far only data from 2013 to 2015 is contained. The full dataset is also somewhat too large for an exercise (1.3M rows), so I suggest to limit yourself to a subset. I have added a file containing the subset of the most severe incidents (Level 7 - all hands) for 2015 only which yields 2,335 incidents.
Unfortunately, the addresses of the incidents were not geocoded yet. Ideally, I would like you to know how to do this but am mindful about the hour or so required to get this done. So, here is the code. The geocodes (as far as they were returned successfully) are part of the data.
NYC Open Data also provides data on the location of all 218 firehouses in NYC. Relevant for our analysis are the following variables:
FacilityName, Borough, Latitude, Longitude
Provide a leaflet map of the severe fires contained in the file severe_incidents.csv. Ignore locations that fall outside the five boroughs of New York City. Provide at least three pieces of information on the incident in a popup.
Start with the previous map. Now, distinguish the markers of the fire locations by PROPERTY_USE_DESC, i.e. what kind of property was affected. If there are too many categories, collapse some categories. Choose an appropriate coloring scheme to map the locations by type of affected property. Add a legend informing the user about the color scheme. Also make sure that the information about the type of affected property is now contained in the popup information. Show this map.
Add marker clustering, so that zooming in will reveal the individual locations but the zoomed out map only shows the clusters. Show the map with clusters.
#### 3. Fire HousesThe second data file contains the locations of the 218 firehouses in New York City. Start with the non-clustered map (2b) and now adjust the size of the circle markers by severity (TOTAL_INCIDENT_DURATION or UNITS_ONSCENE seem plausible options). More severe incidents should have larger circles on the map. On the map, also add the locations of the fire houses. Add two layers (“Incidents”, “Firehouses”) that allow the user to select which information to show.
We now want to investigate whether the distance of the incident from the nearest firehouse varies across the city.
For all incident locations, identify the nearest firehouse and calculate the distance between the firehouse and the incident location. Provide a scatter plot showing the time until the first engine arrived (the variables INCIDENT_DATE_TIME and ARRIVAL_DATE_TIME) will be helpful. If there are any interesting patterns to highlight, feel free to do so.
Referred code from link.
Provide a map visualization of response times. Feel free to differentiate by incident type / property affected etc. if that is interesting.
Note: In the below visualization the radius of the circle represents the response time.